9.4 Likelihood
113
correctness of a model. Since that appears to introduce a wildly fluctuating subjectiv-
ity into the calculations, it seems more reasonable to regard that as a fatal weakness
of the method. 18
To reiterate: our purpose is to find what the most likely explanation of a set of
observations is, that is, a description that is simpler, hence shorter, than the set of
facts observed to have occurred. 19
The three pillars of statistical inference are as follows:
1. A statistical model: that part of the description that is not (at least at present) in
question (corresponding to upper KK in Eq. 6.13).
2. The data: that which has been observed or measured (unconditional information);
3. The statistical hypothesis: the attribution of particular values to the unknown
parameters of the model that are under investigation (conditional information).
The preferred values of those parameters are then those that maximize the likelihood
of the model, likelihood being defined in the following:
Definition. The likelihoodupper L left parenthesis upper H vertical bar upper R right parenthesisL(H|R) of the hypothesis upper HH given dataupper RR and a specific
model is proportional toupper P left parenthesis upper R vertical bar upper H right parenthesisP(R|H), the constant of proportionality being arbitrary but
constant in any one application (i.e., with the same model and the same data, but
different hypotheses).
The arbitrariness of the constant of proportion is of no concern since, in practice,
likelihood ratios are taken, as in the following.
Definition. The likelihood ratio of two hypotheses on some data is the ratio of their
likelihoods on that data. It will be denoted as upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R right parenthesisL(H1, H2|R). The likelihood ratio of
two hypotheses on independent sets of data may be multiplied together to form the
likelihood ratio on the combined data:
upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R 1 ampersand upper R 2 right parenthesis equals upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R 1 right parenthesis times upper L left parenthesis upper H 1 comma upper H 2 vertical bar upper R 2 right parenthesis periodL(H1, H2|R1&R2) = L(H1, H2|R1) × L(H1, H2|R2) .
(9.50)
The fundamental difference between probability and likelihood is that in the inverse
probability approach upper RR is variable and upper HH constant, whereas in likelihood, upper HH is
variable and upper RR constant. In other words, likelihood is predicated on a fixed upper RR.
We shall sometimes need to recall that if upper R 1R1 and upper R 2R2 are two possible, mutually
exclusive, results and upper P left brace upper R vertical bar upper H right braceP{R|H} is the probability of obtaining the result upper RR given upper HH,
then
upper P left brace upper R 1 or upper R 2 vertical bar upper H right brace equals upper P left brace upper R 1 vertical bar upper H right brace plus upper P left brace upper R 2 vertical bar upper H right braceP{R1 or R2|H} = P{R1|H} + P{R2|H}
(9.51)
18 As Fisher and others have pointed out, it is not strictly correct to associate Bayes with the inverse
probability method. Bayes’ doubts as to its validity led him to withhold publication of his work (it
was published posthumously).
19 Sometimes brevity is taken as the main criterion. This is the minimum description length (MDL)
approach. See also the discussion in Sects. 7.4 and 11.5.